-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove MemoryPlan from VM passes #7361
Conversation
// incomplete to provide memory resuse optimizations. Disable it until we can | ||
// rewrite it in C++ and complete it. | ||
// // Perform memory planning in order to coalesce/reduce allocations. | ||
// pass_seqs.push_back(transform::MemoryPlan()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some benchmark data from dynamic models, such as tf ssd/rcnn, to show the performance impact of disabling this pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my experience, the performance difference with and without this pass is not evident, at least for the BERT case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icemelon9 Great to see you working on Bert. I've recently been trying to optimize bert with TVM, especially dynamic-batched case. I found that the Relay VM solution introduced a lot of small pieces of PackedFunc related to alloc_storage's size calculation, and it makes vm slower. Do you have any relevant experience or idea to share on this? Thanks. More detailed discussion: https://discuss.tvm.apache.org/t/guideline-relay-aot/5977/17?u=monklof
cc @zhiics @icemelon9 |
Can we simply use the pass infra to disable it, like
|
We could, but we've tested a few onnx and pytorch models and don't see any performance differences, and @jroesch tells me the purpose of the pass was the first half of plan to do graph-runtime like memory reuse, but the second half was never implemented. Unless we can find a usecase where it helps, I think it makes more sense to disable it entirely until we can get the full feature working. |
@mbrookhart Thanks. It would be great if you can try tf ssd and fasterrcnn so that we can ensure there is no regression for tf models as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll leave the merge decision to @kevinthesun.
@kevinthesun I'd be happy to do some testing. Do you have scripts for running those models? I'm not finding them in the tutorials. |
@mbrookhart Sure. You can refer to tf ssd integration test. |
Okay for TF-SSD, I timed 10 calls to vm.invoke after all of the compilation happened and averaged them, running on a Ryzen 5950x main: |
LGTM |
I'm not sure we have the infra for this, but going forward it would also be interesting to see the memory usage differences with and without, since this pass would affect allocation behavior. I imagine our peak usage would actually decrease atm without the pass since the liveness analysis phase hasn't been implemented, so we are keeping memory around longer than needed. |
Thanks @mbrookhart @masahi @icemelon9 @zhiics |
@jroesch @masahi
As discussed, this disables MemoryPlan in the VM until we can rewrite it to do full reuse planning. The current pass slows down compilation a lot without providing a strong benefit for runtime performance.
This PR also removes a debug print that snuck into another part of the VM